Crowd-based Evaluation of English and Japanese Machine Translation Quality

نویسندگان

  • Michael Paul
  • Eiichiro Sumita
چکیده

This paper investigates the feasibility of using crowd-sourcing services for the human assessment of machine translation quality of English and Japanese translation tasks. Nonexpert graders are hired in order to carry out a ranking-based MT evaluation of utterances taken from the domain of travel conversations. Besides a thorough analysis of the obtained non-expert grading results, data quality control mechanisms including “locale qualification”, “on-the-fly verification” and “payment” are investigated in order to increase the reliability of the crowd-based evaluation results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

Ebaluatoia: crowd evaluation for English-Basque machine translation

This dissertation reports on the crowd-based large-scale English-Basque machine translation evaluation campaign, Ebaluatoia. This initiative aimed to compare system quality for five machine translation systems: two statistical systems, a rulebased system and a hybrid system developed within the IXA group, and an external system, Google Translate. We have established a ranking of the systems und...

متن کامل

Graham, Yvette, Timothy Baldwin, Alistair Moffat and Justin Zobel (to appear) Can Machine Translation Systems be Evaluated by the Crowd Alone? Natural Language Engineering

Crowd-sourced assessments of machine translation quality allow evaluations to be carried out cheaply and on a large scale. It is essential, however, that the crowd’s work be filtered to avoid contamination of results through the inclusion of false assessments. One method is to filter via agreement with experts, but even amongst experts agreement levels may not be high. In this paper, we present...

متن کامل

Crowd-based MT Evaluation for non-English Target Languages

This paper investigates the feasibility of using crowd-sourcing services for the human assessment of machine translation quality of translations into non-English target languages. Non-expert graders are hired through the CrowdFlower interface to Amazon’s Mechanical Turk in order to carry out a ranking-based MT evaluation of utterances taken from the travel conversation domain for 10 Indo-Europe...

متن کامل

Automatic Evaluation of Translation Quality for Distant Language Pairs

Automatic evaluation of Machine Translation (MT) quality is essential to developing highquality MT systems. Various evaluation metrics have been proposed, and BLEU is now used as the de facto standard metric. However, when we consider translation between distant language pairs such as Japanese and English, most popular metrics (e.g., BLEU, NIST, PER, and TER) do not work well. It is well known ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012